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We show that the pseudoflow algorithm for maximum flow is particularly efficient for the bipartite 
matching problem both in theory and in practice. We develop several implementations of the pseudoflow 
algorithm for bipartite matching, and compare them over a wide set of benchmark instances to state-of- 
the-art implementations of push-relabel and augmenting path algorithms that are specifically designed 
to solve these problems. The experiments show that the pseudoflow variants are in most cases faster 
than the other algorithms. 

We also show that one particular implementation — the matching pseudoflow algorithm — is theoret- 
ically efficient. For a graph with n nodes, m arcs, ni the size of the smaller set in the bipartition, 
and the maximum matching value k < ni, the algorithm's complexity given input in the form of ad- 
jacency lists is O (mm{niK,m} + ^/Kmin{K^ ,m}^ . Similar algorithmic ideas are shown to work for 
an adaptation of Hopcroft and Karp's bipartite matching algorithm with the same complexity. Using 
boolean operations on words of size A, the complexity of the pseudoflow algorithm is further improved 



1 Introduction 

The bipartite matching problem is to find, in a given bipartite graph B — (Vi; V2, E\ a matching containing 
a maximum number of edges. That is, a collection of edges MCE such that each node is adjacent to at 
most one of the edges in the matching M. For a survey on early literature on this problem the reader is 
referred to the book by Lawler (53], Chapter 5. 

The bipartite matching problem is equivalent to the maximum flow problem on an associated simple 
bipartite network. (A network is said to be simple if every node has a throughput capacity of 1 unit of 
flow.) Therefore, any maximum flow algorithm can be used to solve the bipartite matching problem. The 
network is constructed by adding source and sink nodes s and t, linking the source to all nodes of Vi with 
arcs of capacity 1 and all nodes of V2 to the sink with arcs of capacity 1, and directing all edges in the 
bipartite graph from Vi to V2 with capacity > 1. Such a network is shown in Figure [T] The maximum 
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Sji-flow on this associated network corresponds to a solution to the maximum matching problem: an edge 
: i £ Vi, j £ V2 is in the matching if and only if the corresponding arc has a flow of one unit on it. 



Other than bipartite matching there are other well-known problems that are solved as maximum flow on 
simple bipartite networks. These include the vertex cover problem on a bipartite graph and the indepen- 
dent set problem, also known as the stable set problem, on a bipartite graph. We refer to the maximum 
flow algorithm for simple bipartite graphs as the bipartite matching algorithm although it applies to these 
problems as well. 

Dinic's |14j maximum flow algorithm is particularly efficient for simple networks as demonstrated by 
Even and Tarjan [ISJ. For bipartite graphs, the running time is 0{y/nim), where ni = |Vi| (w.l.o.g. 
ni < 712 = |^2|)- Hopcroft and Karp ^T] proposed an algorithm for bipartite matching with complexity 
0{^Km), where k is the cardinality of the maximum matching which is bounded by ni. Their algorithm is, 
in essence, the same as Dinic's algorithm adapted to bipartite matching. Feder and Motwani [TB] obtained a 
bound of 0{^/nm*) for the bipartite matching algorithm that relies on speeding up Dinic's algorithm using 
graph compression. In the complexity expression, m* is the number of edges in the compressed graph, which 
is less than m by about a factor of logn. Using boolean word operations on A-bit words, Cheriyan and 
Melhorn [TT] obtained a bound of O(^) while Alt et al. [S] obtained a bound of 0{n'^-^y^) (which is 
better than 0(^^^) for sparse graphs and better than 0{y/rim) for dense graphs). Mucha and Sankowski 
[21] described a randomized algorithm for matching in general (non-bipartite) graphs that runs in 0{n'^), 
where uj = 2.38 is the exponent of the best known matrix multiplication algorithm. 

However, the theoretically efficient algorithms listed above tend to perform poorly in practice. Setubal 
[551 US] showed that in practice, implementations of the push-relabel algorithm of Goldberg and Tarjan [TS] 
were faster than those of Dinic's as well as the algorithm of Alt et al. Cherkassky et al. [12] developed sev- 
eral implementations of push-relabel and performed extensive experiments on several benchmark instances, 
showing push-relabel to be the fastest in practice. 

In this paper, we apply the pseudoflow algorithm of Hochbaum fl9|l20| to bipartite matching and examine 
its theoretical and practical performance. The pseudoflow algorithm was recently shown by Chandran and 
Hochbaum [5] to be the fastest algorithm in practice for the maximum flow problem, and by Hochbaum and 
Orlin to be as efficient as the push-relabel algorithm in theory; hence, it is reasonable to suspect that the 
pseudoflow algorithm is efficient for bipartite matching as well. The major contributions of our work are as 
follows. 

1. We develop several implementations of the pseudoflow algorithm specifically for bipartite matching 
and show that are faster than state-of-the-art implementations of push-relabel for bipartite matching. 
We use the results of the experiments to gain insights into the differences between the pseudoflow and 
push-relabel algorithms. 




Figure 1: Flow graph for bipartite matching. 
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2. We show that a variant of the pseudoflow algorithm, cahed the matching-pseudoflow algorithm, runs 
on a bipartite simple network in time 0(min{niK, m} + y^minjK^, m}). We then show that the 
insights generated from this approach allow to modify either Hopcroft and Karp's algorithm or the 
push-relabel maximum flow algorithm and achieve the same complexity. Using boolean operations on 
A-bit words, we show that the complexity of the matching-pseudoflow algorithm can be further improved 

to O (min {m, mn, ^} + + 

Since the matching-pseudoflow algorithm could be viewed as a superior implementation of Dinic's 
algorithm, we compare the performance of the matching-pseudoflow to the best-known implementation 
of Dinic's algorithm to understand and quantify the key differences between the two algorithms. 

2 Description of the pseudoflow algorithm 

The pseudoflow algorithm and its properties are described in detail in Hochbaum |20) . The description is 
repeated here for completeness. 

2.1 Preliminaries 

Let Gst be a graph {V U {s,t}, AU AgU At), where Ag and At are the source-adjacent and sink-adjacent arcs 
respectively. 

A flow vector / ~ {fij}{ij)eAuAsUAt is said to be feasible if it satisfies 

1. Flow balance constraints: for each i e V, J2{k,i)€AuA,uAt hi = J2{i,j)eAuA,uAt fij (i-'^-' inflow(i) = 
outflow(z)), and 

2. Capacity constraints: the flow value is between the lower bound and upper bound capacity of the arc, 
i.e., £ij < fij < Uij. Without loss of generality, we assume henceforth that iij = (e.g., Ahuja at al. 
[3], pages 191-196). 

A maximum flow is a feasible flow /* that maximizes the flow out of the source (or into the sink). The 
value of the maximum flow is i)GAs f si- 
Given a flow vector / in Gst that is feasible, the residual graph G-^ = {V U {s,t},Af) is constructed as 
follows: for each arc (i, j) G A U Ag U At with flow f^j and capacity Cij, A^ contains two arcs: («, j) with 
capacity Cij — fij and with capacity fij. The capacities of arcs in Af are referred to as the residual 
capacities with respect to flow /, and are denoted by c^. An s,t-cut in the graph is a bi-partition of nodes 
into two disjoint sets - one containing the source and the other containing the sink. One property of the 
residual graph is that the bipartition of nodes of the minimum s, i-cut of G-^ is the same as that in G (e.g., 
Ahuja at al. [3], pages 44-46). 

A pseudoflow / is a flow vector that satisfles capacity constraints, but may violate flow balance at any 
node. The excess of a node v £ V is the inflow into that node minus the outflow denoted by e{v) = 
'Eiu.v)eAuA,uAt fuv - J2iv.w)eAuA,uAt fvw- A negative excess is called a deficit. 

A tree T = (V, E) is a connected, undirected, acyclic graph. A rooted tree has a distinguished node w 
called the root. For each edge [u, v], u is said to be the parent of w if m is closer to the root than v, and is 
denoted by parent (w). Node v is then called the child of u, and is denoted by child(u). The only node in the 
tree that does not have a parent is the root. A node v is said to be an ancestor of a node u if v lies along 
the unique path from v to the root; node u is then said to be a descendant of node v. For convenience, we 
will assume that the tree points topologically "downward" with the root at the "top" of the tree, and each 
node "below" its ancestors. A branch rooted at some node r is a sub-graph of the tree that contains r and 
all its descendants in the tree. A rooted sub-tree is a connected sub-graph of the given tree (unlike a branch, 
it need not contain all the descendants of its root) . 

An arc that carries a flow equal to its upper bound is said to be saturated. The pseudoflow algorithm 
maintains a flow that saturates source- adjacent and sink-adjacent arcs throughout the algorithm. Conse- 
quently, the source and sink have no further role in the algorithm and are contracted into a single node r 
that "keeps track" of the excesses and deficits of the nodes in V by adding excess and deficit arcs as follows: 
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excess > excess < 

Figure 2: A schematic description of a normalized tree. Each is the root of a branch. 

For each node v £ V with positive excess, we add to the graph an arc {v,r) caUed an excess arc, and for 
each node u £ V with negative excess we add an arc (r, u) cahed a deficit arc. The network thus obtained is 
referred to as the extended network G"^^* = U {r}, A U A^), where Ar is the set of excess and deficit arcs. 

For a tree T, an arc (m, v) is said to be in-tree if the edge [m, v] £ T . Arcs that are not in tree are said to 
be out-of-tree. Given a pseudoflow / that saturates Ag and At, a normalized tree is a tree in G"^^* rooted at 
r that satisfies the foUowing three properties. 

Property 2.1 The nodes that do not satisfy flow balance constraints are the children of r and are the roots 
of their respective branches. 

Property 2.2 The pseudoflow values of f on out-of-tree arcs are at the lower or upper bound capacities of 
the respective arcs. 

Property 2.3 In every branch, all downward residual capacities are strictly positive. 

A schematic description of a normahzed tree is shown in Figure [2j 

The pseudoflow algorithm starts with any normalized tree and an associated pseudoflow. The generic 
initialization is the simple initialization; source-adjacent and sink-adjacent arcs are saturated while all other 
arcs have zero flow. 

If a node v is both source-adjacent and sink-adjacent, then at least one of the arcs {s,v) or {v,t) can 
be pre-processed out of the graph by sending a flow of min{csv , Cvt} along the path s — ^ z; — ^ t. This flow 
eliminates at least one of the arcs {s,v) and {v,t) in the residual graph. We henceforth assume w.l.o.g. that 
no node is both source-adjacent and sink-adjacent. 

The simple initialization creates a set of source- adjacent nodes with excess, and a set of sink-adjacent 
nodes with deficit. Since all other arcs have zero flow, they are all out-of-tree arcs. Thus, each node is a 
singleton branch for which it serves as the root, even if it is balanced (with 0-deficit). The simple initialization 
results in a simple normalized tree shown in Figure [3j 

2.2 A labeling pseudoflow algorithm 

In the labeling pseudoflow algorithm, all nodes carry a label £v for all v G V. Initially, all labels are set to 
the value 1. An iteration of the algorithm consists of identifying a branch with root carrying strictly positive 
excess, and attempting to push this excess towards the sink through the residual network. The process of 
pushing excesses towards the sink is performed via a merger. Given a branch with root of label i and positive 
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Figure 3: A simple normalized tree. 

excess, a merger operation consists of identifying a merger arc with positive residual capacity from a node 
of label i within the branch to some node of label £ — 1 in the graph. 

A relabeling of a node is the increase of a node's label by one unit. A node of label I is relabeled to £ + 1 
if there is no merger arc in the residual graph to a neighbor of label ^ — 1, and if all its children in the branch 
have label at least £ + 1. With these rules, the labels satisfy the following properties. 

Lemma 2.1 (Hochbaum jl9L I20| ) For the labeling pseudoflow algorithm, the labels satisfy: 

(a) For every residual arc {u,v), £u ^ + 1- 

(b ) The labels of nodes are monotone nondecreasing in the downwards direction in each branch. 

Corollary 2.1 (Hochbaum jl9L I20j ) The label assigned to a node throughout the labeling pseudoflow al- 
gorithm does not exceed the length of a shortest path to a sink- adjacent node in the residual graph plus the 
label of the sink- adjacent node. More generally, the positive difference in labels of two nodes does not exceed 
the length of the residual path between them. 

For convenience, we henceforth refer to the branch containing the tail of the merger arc as the "from- 
branch" and the one containing the head of the merger arc as the "to-branch" . Once a merger arc is identified, 
a merger operation is performed on the normalized tree. This consists of adding the merger arc to the 
normalized tree, and removing the arc from the root of the from-branch to the root of the normalized tree. 
The merger operation is shown in Figures [4][a) and (b). At the end of a merger, the tree is not a normalized 
tree since it has a non-root node carrying positive excess. The merged branch is now renormalized, a process 
that may create any number of branches out of the merged branch. The process of renormalization of the 
merged branch consists of pushing the excess of the root of the from-branch towards the root of the to- 
branch and updating the pseudoflows and excesses. The path from the root of the from-branch to that of 
the to-branch is unique since they are nodes in a connected tree. For each edge on this path, the operation 
of pushing the excess from the child to its parent and updating the pseudoflow on the edge is called a push. 
If only a part of the child's excess can be pushed to its parent due to insufficient residual capacity on that 
arc, the child retains some positive excess. The edge to its parent is then removed from the normalized tree 
and an excess arc is added for the child node making it the root (with positive excess) of a branch consisting 
of all nodes below it. This operation, called a split, is shown in Figures |4]jb) and|4|^c). 

If the root of a branch is relabeled to label n at some point in the algorithm, all nodes in this branch have 
label n. By Corollary |2.1[ this implies that all deficit nodes are unreachable from nodes of label n. Hence, 
all nodes in the branch must be in the source set of a minimum cut, and can be ignored for the remainder 
of the algorithm. Thus, the algorithm terminates when (i) there are no branches with root carrying positive 
excess, or (ii) all such roots have a label of n. 

When the algorithm terminates, we obtain a normalized tree and a pseudoflow where all nodes belonging 
to branches with positive excess (if they exist) have label n. This is not a feasible flow since the normalized 
tree has excess and deficits. However, the normalized tree contains information regarding a minimum cut, 
which is stated in the following theorem. 
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(a) (b) (c) 

Figure 4: (a) Initial normalized tree, (b) Tree obtained after the merger, (c) Re-normalized tree after split 
due to insufficient residual capacity on edge [v,u]. 

Theorem 2.1 (Hochbaum [191 120] ) The source node along with all nodes of label n in the normalized 
tree form the source set of a minimum cut while the remaining nodes form the sink set of a minimum cut. 

2.3 Implementation details 

Limiting the number of arc scans: During the labeling algorithm, the arcs adjacent to each node are 
examined at most once (see Hochbaum [inilSn]) for each value of the node's label. To implement this, we 
maintain a pointer at each node to the arc that was last scanned to find a merger. If any node is visited 
more than once for a given label, the search for mergers resumes from the last scanned arc, thus ensuring 
that each arc is scanned at most once for each label. When a node is relabeled, the pointer is reset to the 
start of its list of adjacent arcs. 

Root management: The labeling algorithm requires that all roots with positive excess and of a particular 
label be available when queried. To achieve this, the roots are maintained in an array of buckets, where a 
bucket contains all roots with positive excess and with a particular label. The order in which roots within a 
bucket are processed for mergers appears to make a difference to the pseudoflow algorithm. Anderson and 
Hochbaum pjj experimented with three branch management policies: 

• FIFO: Each bucket is maintained as a queue; roots are added to the rear of the queue, and roots are 
retrieved from the front of the queue. 

• FIFO: Each bucket is maintained as a stack; roots are added to the top of the stack, and roots are 
retrieved from the top of the stack. 

• Wave: This is a variant of the LIFO policy. Each bucket is still maintained as a stack, with roots 
being added to the top of the stack and being retrieved from the top. However, when the excess of a 
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root changes while it is in the bucket, it is moved up to the top of the stack. 

Note that the wave management pohcy is the same as the LIFO pohcy for the lowest label variant 
since the excess of a root with positive excess does not change while it is in a bucket. (When a root is 
processed in the lowest label algorithm, all mergers are from a branch with positive excess to one with 
non-positive excess, leaving all other roots with positive excess unchanged.) 

Gap Relabeling: We use the gap-relabeling heuristic of Derigs and Meier [T3], who introduced it in the 
context of push-relabel. When we process a branch whose root has label t and there are no nodes in the 
graph with label £ — 1, we conclude that the entire branch has no residual paths to the sink and is hence a 
part of the source set of a min cut. The entire branch can thus be ignored for the rest of the algorithm. In 
practice, this is achieved by setting the labels of all nodes in that branch to n. 

The Min-cut Stage refers to all the operations executed until a minimum cut is obtained. 



2.4 Lowest and highest label pseudoflow variants 

In the generic labeling algorithm, the branch with a root carrying positive excess that is selected for processing 
(finding mergers) is chosen arbitrarily. In the lowest label variant, the root carrying positive excess with the 
lowest label is identified and the branch is processed for mergers so long as its root remains the lowest 
labeled root with positive excess. In the highest label variant, the branch that is chosen is the one with root 
of highest label, i.e., at each iteration the root carrying positive excess with highest label is identified and 
that branch is processed. Note that in the lowest label variant, the root of the from-branch has positive 
excess while that of the to-branch has non-positive excess, while in the highest label variant, roots of both 
the from-branch and to-branch could have positive excess. 

3 Complexity of the pseudoflow algorithm for bipartite matching 

We now analyze the complexity of the highest and lowest label psuedoflow algorithms when applied to 
bipartite matching. 

Definition 3.1 The algorithm is said to be in phase £ when nodes of label £ are being examined for mergers. 

Let the cardinality of the maximum matching in G be k. Since the graph is bipartite, every alternate 
node in any path in the network must be a V2-iiode. The shortest path from any node in the network to 
a node with strict deficit (i.e., an unmatched V2 node) can contain at most k matched V2-nodes, hence its 



length is at most 2k. By Corollary 2.1 this means that the label of each node (and hence the number of 



phases) for the lowest label algorithm is 0{k), while that for the highest label algorithm is 0{ni). 
Proposition 3.1 The depth of the normalized tree is 0{k). 

Proof: Consider a path from a node up to the root of the normalized tree. Since the graph is bipartite, 
the path is made up of alternating nodes from Vi and V2. Thus, every alternate edge in the path is a valid 
matching, which bounds the length of the path (and thus the depth of the tree) by 2k. ■ 

The implication of the above proposition is that the work done per merger is 0{k). 

Proposition 3.2 The number of arc scans in the lowest pseudoflow algorithm for bipartite matching is 
0(min{KTO, u^k}). 

Proof: Hochbaum [Tni showed that each arc is examined 0(1) times per phase. Since there are 0{k) 
phases and m arcs, the total number of arc scans is 0{Km). 

Each time a node is processed, its neighbors are examined in order to find a merger. Since there are 
0{ni) nodes in the normalized tree, at most ui neighbors need to be examined in order to find a merger 
or determine that no merger exists. Thus, the total number of arc scans is the number of nodes in the 
normalized tree times the number of arc scans per phase times the number of phases, which is O(n^K). M 
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Similarly, for the highest label algorithm, the number of arc scans 0(min{nim, }). 

Following the pscudopolynomial complexity analysis of the generic lowest label pseudoflow algorithm 
from Hochbaum [19, 20 , we get a bound of O(niK) on the number of mergers for the lowest label variant. 
The number of mergers in the highest label pseudoflow algorithm is 0{nim) (as shown by Hochbaum |191I20) . 
the number of mergers is bounded by m times the number of phases) . 

The total work done in the pseudoflow algorithm is the number of arc scans plus the number of mergers 
times work per merger (which is 0{k) as shown above). Thus, the complexity of the lowest label pseudoflow 
algorithm for bipartite matching is 0(min{KTO, ti^k} + niK^), while that of the highest label algorithm is 



4 The free-arcs pseudoflow algorithm for bipartite matching 

In the free-arcs version of the pseudoflow algorithm, the normalized tree satisfies the following property in 
addition to Properties |2.1| through |2.3[ 

Property 4.1 In every branch, all upward residual capacities are strictly positive. 

The only difference from the perviously described pseudoflow algorithm is in the split operation, which 
is now initiated if the upward residual capacity of an in-tree arc becomes zero after a push. 

The implication of the above property is that the normalized tree contains only "free" arcs, i.e., arcs that 
have flow strictly between their lower and upped bounds. 

Given a bipartite graph G = {Vi; V2, E), the flow network is constructed by adding source and sink nodes 
s and t, linking the source to all nodes of Vi with arcs of capacity 1 and all nodes of V2 to the sink with 
arcs of capacity 1, and directing all edges in the bipartite graph from Vi to V2 with infinite capacity. For 
the free-arcs algorithm, the infinite capacity on arcs from Vi to V2 implies that all in-tree arcs have unit flow 
while all out-of-tree arcs have flow equal to the lower bound of zero (the flow on an arc can never be at its 
upper bound). 

Lemma 4.1 The pseudoflow algorithm for bipartite matching can create only four types of branches - two 
types of strong branches STi and ST2, and two types of weak branches WTi and WT2 (as described in Figure 
^a)). 
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Figure 5: Types of branches that can exist in a normalized tree during execution of the free-arcs pseudoflow 
algorithm for bipartite matching. 



Proof: The proof is by induction. The inductive assumption applies initially as in the simple normalized 
tree all nodes of Vi are STi branches and all nodes of V2 are WTi branches. Given that an iteration starts 
with these two types of strong branches and two types of weak branches, only four types of mergers are 
possible as shown in Figure ^h). AU these mergers result in one or two of these types of branches, and thus 
the proof is complete. H 
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Figure 6: Types of mergers that can occur in the free-arcs pseudoflow algorithm for bipartite matching. 



Each WT2 and ST2 branch contains an edge between a Vi node and a V2 node, and all branches are node- 
disjoint. Thus, the set of WT2 and ST2 branches represent a valid matching, which leads to the following 
property. 

Property 4.2 The number of ST2 and WT2 branches is bounded by k, the cardinality of the maximum 
matching. 



4.1 Complexity of the free-arcs pseudoflow algorithm for bipartite matching 

All the results in Section [s] are still valid, except that the work done per merger is now 0(1). The complexity 
of the free-arcs version of the pseudoflow algorithm is thus the number of arc scans plus the number of 
mergers, which is 0(min{Km, n^^}) for the lowest label variant and 0{nim) for the highest label variant. 

5 The matching-pseudoflow algorithm 

The matching-pseudoflow algorithm is a pseudoflow algorithm with global relabeling and delayed relabeling. 
We first introduce the concept of two-edge distance labels in the graph. 

Definition 5.1 The two-edge distance label of a node is the number of Vi-nodes in the shortest path from 
that node to a WTi branch in the residual network. 

The notion of two-edge distances in bipartite graphs has been used previously e.g. by Ahuja et al. [1]. 
Initially, all labels of nodes in Vi, which are STi branches, are set to 1 and the labels of nodes in V2, which 
are WTi branches, are set to 0. Throughout the algorithm the labels of nodes that form WTi branches 
remain as their two-edge distance (to themselves) is 0. 

Delayed relabeling means that all possible mergers from lowest labeled strong nodes of label i are per- 
formed without relabeling the nodes when no merger is found. Once all the nodes of label i have been 
examined for mergers, all the node labels are set to be the shortest two-edge distance to a WTi branch in 
the residual graph and the set of all mergers starting from the lowest labeled strong root are again performed. 
The process of computing all the node distance labels is referred to as global relabeling |17j . 

We now demonstrate that the two-edge distance labels satisfy properties analogous to (a) and (b) of 



Lemma 2.1 Property (a) is satisfied by the distance labels of nodes which are the lengths of the shortest 
residual path from each node to the sink. The second property of monotonicity (b) is shown to be satisfied 
next. 



Lemma 5.1 The two- edge distance labels satisfy property (b) in Lemma 2.1 
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Proof: Two-edge distance labels satisfy that both nodes in a WT2 branch have the same label: In a WT2 
branch, all arcs into its root (a T^2-iiode) other than that from its child (a l^-node) carry zero flow; the arc 
from its child carries a flow of 1 unit. The arc from its child is the only arc with positive residual capacity 
adjacent to the root. Thus, the root can reach a WTi node only through its child and the shortest path from 
the root to a WTi branch will contain the shortest path from its child to a WTi branch. So the number of 
Vi-nodes in the shortest path from the root to a WTi branch will be the same as that in the shortest path 
from its child to a WTi branch, ensuring that the root and child have the same two-edge distance label. 

A similar argument holds for the ST2 branches. Let Ifi be the label of the right child and £l be the label 
of the left child (assume w.l.o.g. that £r < £l)- The root of an ST2 branch can reach a WTi branch only 
through on of its children; so the label of the root from a WTi branch will be equal to the smaller label 
of the two children. Since there is a residual arc (of infinite capacity) from the left child to the root, the 
two-edge distance from the left child to the right is 1. Hence, £l < £r + 1, and the label of the left child is 
at least equal to and at most one greater than the label of the root. ■ 

Definition 5.2 Stage i of the algorithm is the maximal set of mergers that occur while the shortest two-edge 
distance from a strong node to a WTi branch is £. 

An initialization procedure, equivalent to a stage 1, creates a maximal set of WT2 branches by scanning 
the neighbors of each Vi-node to identify an unmatched V2-node and then performing a merger. Since the 
cardinality of the maximum matching is at most k, we need to scan at most n neighbors of each Vi-node to 
identify an unmatched V2-node or determine that none exists. Also, each arc is scanned at most once, so 
the complexity of this procedure is 0(min{m, hik}). At the end of the initialization, the shortest two-edge 
distance from a STi branch to a WTi branch is at least 2. 

We now elaborate on the implementation of a stage. To facilitate the description, we introduce the 
following notation for labels of nodes in a branch. The labels of an branch are represented by the 
triplet {left, root, right) which represent the labels of the left child, root, and right child respectively. We 
will assume w.l.o.g. that the left child has label greater than or equal to that of the right child. Labels in a 
WT2 branch are represented by the pair (child, parent) which represent the labels of the child and parent 
respectively. 

At the beginning of each stage, global relabeling is performed, and all nodes are "unflagged", which 
marks them as being unvisited. Mergers are allowed only between unvisitcd nodes. 

The merger/split operations at each stage are such that they satisfy the property that the stage begins 
and ends with only WTi, STi, and WT2 branches; branches are only formed temporarily during a stage. 
This inductive property holds initially for stage 2 since no 5^2 branches are formed in stage 1 (the greedy 
initialization) . 

Suppose that at the beginning of stage £ > 2, the set of branches consists of STi branches of label > £; 
WT2 branches in which both the nodes have the same label p {1 < p < £); and WTi branches which have 
label 0. Consider a sequence of mergers starting from a STi branch of lowest label £. The flrst merger is 
from a STi branch of label £ to an unvisited root of a WT2 branch with label {£ — 1,£ — 1). This creates a 
ST2 branch (£,£— 1,£ — 1). This branch now has the lowest labeled strong root, and the search for mergers 
starts from the right child labeled £ — 1. 

Suppose that at some point a merger results in a ST2 branch {p + l,p,p). The search for mergers now 
starts from the right child of this branch resulting in one of the following possible outcomes. 

1. There is no merger to an unvisitcd weak node of label p — 1: Here we delay the relabeling of that node 
to the end of the stage and mark the root of the branch as being visited implying that the branch 
cannot participate in any more mergers at the current stage. In this case, a backtrack operation is 
performed to reverse the last merger and restore the structure of the branches to what it was prior 
to the last merger. This is shown in Figure [7j For example, consider the case where the backtrack 
operation occurs from a branch of label (£,£— l,£ — 1) in stage £. Suppose there are no mergers from 
the right child of this branch, then the backtrack operation splits the branch, creating a WT2 branch 
of label {£ — 1,£ — 1) and one STi branch of label £. The root of the WT2 branch is marked as visited, 
and the search for mergers continues from the STi branch. If no more mergers are possible from this 
node, it is marked as visited, and a new lowest labeled strong node is picked. This procedure continues 
until there are no more unvisited STi nodes of label £. 



10 



2. p > 1 and a merger is found to an unvisited root of a WT2 branch of label (p — l,p — 1): This creates 
a ST2 branch of label {p,p— l,p— 1) and the search for mergers continues from the right child of this 
branch. 

3. p = 1 and a merger is found to a WTi branch which has label 0: This creates a new WT2 branch (1, 0), 
incrementing the size of the current matching. The branches involved in this sequence of mergers are all 
marked as visited, and do not participate in any more mergers in stage £. The process of searching for 
mergers then starts with an unvisited £ labeled strong node if there is one, or else the stage terminates. 




No mergers 



(a) (b) (c) (d) 

Figure 7: (a) Branches before mergers (shown in dotted lines), (b) Branches after mergers until label p+1, 
(c) Branches after merger from p+1 to p, (d) Branches after lack of merger causes a backtrack. The branch 
with label p is marked as having been visited. 

We have thus proved the following lemma, which holds inductively given that stage 1 ends with STi, WTi, 
and WT2 branches. 

Lemma 5.2 With the merger /split/hacktrack operations described above, a stage that begins with only WT\, 
WT2, and STi branches terminates with only these three branch types. 

Definition 5.3 A successful path of length i is a sequence of mergers, at stage I, that starts at a STi branch 
of label £ and ends at a WTi branch of label 0. 

A successful path contributes to the increase of the number of WT2 branches by 1 which is equivalent to 
increasing the size of the matching. The mergers that form a successful path are called successful mergers. 
The next lemma proves that the procedure of flagging nodes as visited does not block off any successful 
paths, implying that all successful paths of length £ are found in stage £. 
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Lemma 5.3 A node that is marked as visited in stage i can no longer be part of a successful path of length 
I. 

Proof: If a V2-node has been marked as visited after no merger have been found from its right child, then 
that child cannot lead to any merger in the current stage. Hence, once a backtrack occurs, the V^-node is 
marked as visited, and need not be visited again during that stage. 

If an unvisited V2-iiode, v of label p belongs to a successful path then it has a child of label p+1 after 
the successful merger. If v were to participate in another successful merger at the same stage, the sequence 
of labels in this second successful path would be ^ — >■ (£ — 1) — > .. — )■ (p + 1) — >■ p — >■ (p + 1) — >■ p — >■ .. ^ 0. 
The length of such a path is strictly greater than a two-edge distance £ since layer p is visited twice in this 
path. Thus, each V2-iiode participates in at most one successful path of length ^ in stage Hence, a node 
that was part of a successful merger is marked as visited, and need not be visited again during that stage. ■ 

Corollary 5.1 Each V2-node participates in at most one successful merger or one backtrack at each stage. 

Corollary 5.2 The labels of all lowest labeled strong nodes of label i at stage £ strictly increase after the 
termination of stage i. 

Performing global relabeling is equivalent to generating a so-called layered network (as in Dinic's maxi- 
mum flow algorithm). In a layered network, each layer consists of all branches with a particular label. In 
stage £, layer consists of all nodes that have distance label 0, i.e., only WTi branches. Layers 1 through 
£ — 1 consist of WT2 branches, and layer £ consists of STi branches. 

We now describe the procedure for generating the layered network, which is the critical part of our 
algorithm. Let the k-layer {0 < k < £) he the set of nodes with label k. The layered network can be 
generated by scanning all backward residual arcs from the sink using a Breadth-First-Search (BFS). 

In a naive implementation of BFS one would start with all WTi branches (label 0) and look at all 
incoming arcs in the residual network to generate the 1-layer. This could take 0{m) work and is expensive. 
An alternative approach is to check for each WT2 branch whether it is in the 1-layer by checking if there is 
an arc from its child node to a WTi branch. Using the fact that labels are non-decreasing, we only need to 
check this for WT2 branches that were of label 1 in the previous layered network. 

Generating the 1-layer: The neighbors of each Vi child node of label 1 in a WT2 branch are scanned to 

identify a residual arc to a WTi branch. Since there are at most k WT2 branches, we need to scan at most k 
neighbors of each Vi-node of label 1 to identify a WTi neighbor node or determine that none exists (in which 
case the branch does not belong to the 1-layer). If a WTi node is not adjacent to a Vi-node of label 1 in 
stage £ then it cannot be adjacent to that Vi-node in any later stage. This is because no now WTi branches 
are created in any stage. Hence, each arc needs to be scanned at most once throughout the algorithm. By 
maintaining a pointer for each Vi-node to the last arc scanned at each stage, and resuming the search from 
that arc in the next stage, we can ensure that each arc is scanned at most once throughout the algorithm. 

Claim 5.1 The total work done to generate the i-Zayer throughout the algorithm is 0(min{K^, m}). 

Generating layers 2 through £ — 1: Given the set of WT2 branches in layer p, the incoming residual arcs 
into the root of each WT2 branch in layer p are examined to obtain neighbors in layer p+1. Scanning the 

incoming residual arcs of a WT2 branch stops if an STi neighbor is found, since then p = £—1 and the WT2 
branch is in the £ — 1 layer its STi neighbor is thus in the £-layer. 

There are at most k incoming arc scans for each WT2 root required to label all the WT2 branches in 
the next layer or find a STi branch. Since there are at most k roots of WT2 branches and each arc is 
scanned at most once in each stage, so total work done in generating layers 2 through £ — 1 at each stage is 
0(min{K;^, m}). 

Claim 5.2 The work done per stage to generate the layers 2 through £ — 1 is 0(min{K^, m}). 

The layered network generated has all layers of weak branches up to the £ — 1-layer, and some STi 
branches in the f-layer. This ^-layer may not contain all the STi branches of label £ since not all incoming 
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arcs to the WT2 branches were examined. However, by Corollary |5.1[ it is sufBcicnt to have at most one 
neighbor STi branch for every WT2 branch of label i — 1. Therefore, instead of explicitly generating the 
entire £-layer, once an STi branch of label £ is found, it is determined that the WT2 branch is in the 1-layer 
- the last layer of weak branches. 

For each unvisited WT2 branch of label i — 1, we scan its incoming arcs to check for an STi neighbor. If 
such an STi branch is found, a sequence of mergers is initiated from this strong branch. If a successful path 
is found, or if no more mergers are possible from this strong branch, another unvisited WT2 branch in the 
£ ~ 1 layer is chosen and its incoming arcs are scanned to identify a new STi branch from which mergers 
are initiated. This continues until all the WT2 branches in the £ — 1 layer have been visited or have been 
scanned for a neighboring STi branch. 

There are at most k branches in the £ — 1 layer. For each such branch, at most 2k incoming arcs need 
to be scanned to identify a new neighboring STi branch. Also, each arc in the network is examined at most 
once, so the work done per stage in identifying the necessary STi branches in the €-layer is 0(min{TO, k^}). 

Each arc participates in at most one merger per stage, each of which requires 0(1) work; mergers thus 
require 0(min{K^, to}). A backtrack operation is performed at most once for each WT2 branch, so work 
done in backtracking is 0{k) per stage. 

Lemma 5.4 The work done per stage including generating the layered network, mergers, and backtrack 
operations is 0(min{fi;^, m}). 

Lemma 5.5 The number of stages in the algorithm is 0{^/k). 

The proof is along the lines of those of Even and Tarjan [15] and Hopcroft and Karp [21]. Details are 
provided in Section |A] of the appendix. 

Theorem 5.1 For input given in the form of adjacency lists, the complexity of the matching-pseudoflow 
algorithm is 0(min{niK,TO} + min{K^, 77i}-\/k). 

Proof: The complexity of initialization is 0(min{niK, to}). There are 0{y/K) stages in the algorithm, each 
of which takes 0(min{K^, to}). The work to generate layer 1 is 0(min{K^,TO}) throughout the algorithm. 
The total complexity is therefore 0(min{niK, m} + minj^^, m}y/K). M 

A high-level description of the matching-pseudoflow algorithm is given in Figure [8| 

Note that the matching-pseudoflow algorithm could be viewed as an efficient implementation of Dinic's 
algorithm with two-edge pushes: a successful path of mergers is essentially an augmenting path, while the 
procedure for generating the layered network is the same once greedy initialization has been performed. 
Similarly, the matching-pseudoflow algorithm could also be interpreted as an implementation of push-relabel 
with two-edge pushes that uses delayed relabeling and global relabeling. 

5.1 Matching-pseudoflow with word operations 

The complexity of the matching-pseudoflow algorithm can be further improved to O ^minjniK, "'-^"'^ , m} + k,"^ + 

using boolean word operations, where A is the length of a word, as done by Cheriyan and Mehlhorn j llj . 
The key idea is to represent the graph adjacency structure using words and performing boolean operations 
on these words to find merger arcs. Details are provided in Section [B] of the appendix. 

5.2 A combined algorithm 

We follow the approach of Alt et al. [S] to combine the matching-pseudoflow with and without words to 
describe new complexity bounds. The new bound is obtained by applying the matching-pseudoflow without 
words until a certain stage £ and then using word operations for the rest of the algorithm. The greedy 
initialization procedure is performed with words, which has a complexity of 0{mm{niK, "\^^ ,to}). The 
words SUB-IN and SUB-OUT described in section |B] are also constructed irrespective of the algorithm used. 
These two operations have a complexity of 0(min{7iiK, ^^J^^m} + 
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The procedure finds a maximum matching in a bipartite graph G — (Vi; V2, E). It terminates with 
a set of STi, WTi, and WT2 branches; the set of edges in the WT^ branches form a maximum 
cardinality matching. 

/ 

procedure matching-pseudoflow: 
begin 

Generate a greedy maximal matching of STi, WTi, and WT2 branches; 
Generate a layered network; 

Mark all nodes in the layered network as unvisited; 
while the lowest label of an STi branch is less than \Vi\ do 
while 3 a lowest labeled unvisited Vi-node v of label £ do 

if 3 a merger from v to an unvisited node of label (^-1) do 
Perform merger (as in Figures [7f^a)-(b)); 
if merger leads to an augmentation do 

Mark all nodes along the successful path as visited; 

else do 

Mark branch containing node v as visited; 

Perform backtrack (as in Figures [7][c)-(d)); 
Generate a new layered network; 
Mark all nodes in the layered network as unvisited; 

end 



Figure 8: High-level description of the matching-pseudoflow algorithm. 



Case 1: 



e 0(m) 



The work done until stage £ without using words is 0{£k?). Following analysis similar to that in the proof 



of Lemma 5.5 the remaining number of stages is n/ £. The work done beyond stage £ using word operations 
is 0{n^^^ /X). The value of £ that minimizes the total work done is obtained by solving for £ in the equation 



This yields £ — y n/X and an overall complexity of 0(min{riiK, m} + + ^^), which is dominated 
by the complexity of the algorithm with word operations. Thus, when e 0(m), combining the two 
algorithms does not provide any benefit. 

Case 2: e VL{m) 

The work done until stage £ is 0{£m). We again find the best value of £ by solving £m — which 



gives 



This leads to an overall complexity of 0(min{niK, 



A.5 



This is better than the ^fkm complexity when G 0(m A). Table [if summarizes the complexity results. 



))• 



Algorithm 


Best when 


Complexity 


With word operations 


e 0(m) 


0(min{niK, m} + ^ k^-^/X) 


Combined 


G f7(m) U 0(Am) 


0(min{niK, ^,m} + + k^-^^t) 


Without word operations 




(minjniK, m} + Y^min{K^, m}) 



Table 1: Summary of complexity results for the matching-pseudoflow algorithm. 



Note that the complexity expressions for the matching-pseudoflow algorithm without words, the combined 
algorithm, and the matching-pseudoflow algorithm with words are correspondingly faster than the algorithms 
of Hopcroft and Karp [3T] with complexity 0{y/Km), Alt et al. [F with complexity O(n^-^y^), and Cheriyan 

and Mehlhorn TT with complexity 0(^^^). 
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6 An experimental study 



6.1 Implementations 

We developed eight pseudoflow implementations for bipartite matching: 

1. Five "regular" pseudoflow implementations — highest label with FIFO buckets (pseudo_hi_fifo) , highest 
label with LIFO buckets (pseudo_hi_lifo) , highest label with Wave buckets (pseudo_hi_wave), lowest 
label with FIFO buckets (pseudo_lo_fifo) , and lowest label with FIFO buckets (pseudo_lo_lifo) . 

2. Two "free-arcs" variants — pseudo_hi_free and pseudo_lo_free that are the highest and lowest label im- 
plementations of the free-arcs pseudoflow algorithm. Both these implementation use LIFO buckets, 
which were found to be fastest in initial testing. We use a global relabeling heuristic that periodically 
re-computes distance labels to all Vi-nodes in the graph. 

3. The matching-pseudoflow algorithm. 

The latest version of the code (version 1.01) is available at [5j. 

Cherkassky et al. [12] developed the following algorithms for bipartite matching that implement "two- 
edge" pushes: 

• bim_dfs and bim_bfs: These two variants apply a simple depth-first-search and breadth-first-search 
respectively to find augmenting s-t paths. 

• pr_bim_hi, pr_bim_lo, and pr_bim_fifo: These are implementations of the highest label, lowest label, and 
FIFO push-relabel variants respectively. 

• bim_ar: The "augment-relabel" algorithm could be thought of as a hybrid between an augmenting path 
algorithm and push-relabel. It is similar in spirit to the basic algorithm described by Alt et al. for 
the bipartite matching problem. 

• bim_lds: The "label-directed-search" variant uses a depth-first-search along with "approximate" dis- 
tance labels that are periodically updated using global relabeling. 

In addition, we tested dinic, an implementation of Dinic's algorithm by Setubal 25, , and abmp, a simplified 
implementation by Setubal [26] of the algorithm of Alt et al. [5^ that is available as part of the BIPM solvers 
for bipartite matching T. While dinic was shown to have poor performance in practice, we use it mainly to 
compare it to matching-pseudoflow, which is its closest pseudoflow counterpart. 

The pseudoflow codes, dinic, and abmp were written in C and compiled with the gcc compiler while 
those of Cherkassky et al. [E] were written in C++ and compiled used the g++ compiler. The -04 compiler 
optimization flag was used in all cases. 

6.2 Computing environment 

The experiments were run on a Sun UltraSPARC workstation with a 270 MHz CPU and 192 MB of RAM. The 
results of the machine calibration experiment as suggested by the First Dimacs Implementation Challenge 
[2] are shown in Table [2j 





Test 1 
real user system 


Test 2 
real user system 


No optimization 
-04 flag 


0.4 0.4 0.0 
0.2 0.1 0.0 


3.3 3.3 0.0 
2.0 1.9 0.0 



Table 2: Average running times for Dimacs machine calibration tests. 
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6.3 Differences between matching-pseudoflow and dinic in practice 



As noted earlier, the matching-pseudoflow algorithm has parallels to Dinic's algorithm (global relabeling is 
equivalent to generating a layered network in Dinic's algorithm, and a successful path is equivalent to an s-t 
augmenting path). However, what sets the matching-pseudoflow algorithm apart from Dinic's algorithm is 
the manner in which global relabeling is performed. In this section, we demonstrate that the global relabeling 
procedure which leads to a better theoretical complexity also makes a significant difference in practice. 

In the matching-pseudoflow, only the nodes in the current matching and their adjacent edges are examined 
during global relabeling, whereas in Dinic's algorithm the entire network (including all unmatched V2-nodes) 
and their adjacent arcs are examined to construct the layered network. Therefore, in practice, we would 
expect the run-time of the matching-pseudoflow algorithm to be dependent largely on k, while that of Dinic's 
algorithm to be dependent on m. 

We implemented the matching-pseudoflow algorithm and compared it to the best-known implementation 
of Dinic's algorithm for bipartite matching [25 . Instances were generated in the following manner: given ni, 
n2, and the expected number of edges m, a graph is generated where each of the possible nin2 edges exists 
independently with probability m/ [nin2). We generated problems with ni = 16384, and n2/ni ranging from 
1 to 1.2 in steps of 0.01. Thus, the most unbalanced graph had 19661 nodes. For each of the 20 classes, 
we generated graphs with expected number of edges 81920, 163840, 245760, and 327680 respectively, which 
resulted in n being exactly or very close to ni. For each of the combinations of n2 and m, we generated 10 
instances and the time for each instance was averaged over 5 runs. Thus, each data point is the average of 
50 runs. The nm-times are shown in Figure [9j 
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Figure 9: Run-times of the matching-pseudoflow and Dinic's algorithms on random unbalanced instances. 
There are two key observations to be made from the results. 
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1. The matching-pseudoflow algorithm is more robust to imbalances in the graph. For Dinic's algorithm, 
the balanced instances are the hardest to solve; even small imbalances in the graph (^2 = l.Olni) make 
drastic differences to the run-time. 

2. The run-time of Dinic's algorithm goes up with the number of edges, but the run-time of the matching- 
pseudoflow is virtually independent of the number of edges. In fact, the hardest instances for the 
matching-pseudoflow appear to be the ones with fewest arcs. 



6.4 Test instances 

We tested the algorithms on the seven problem families (hilo, fewg, manyg, grid, hexa, rope, and zipf) used 
by Cherkassky et al. [T^]. All the benchmark instances were balanced, i.e., ni = n2- The instances are 
described in greater detail in Section [D] of the appendix. 

For each instance family, we report the results of our experiments for 

• Five pseudoflow implementations: pseudo_lo_lifo and pseudo_hi_wave, which were found in initial test- 
ing to be the fastest variants for the lowest and highest label algorithms respectively, pseudo_lo_free, 
pseudo_hi_free, and matching-pseudoflow. All the pseudoflow variants were initialized with a greedy 
matching. 

• Three implementations of Cherkassky et al. [Hj: pr_bim_hi, pr_bim_lo, and the best implementation 
among pr_bim_fifo, bim_dfs, bim_bfs, bim_ar, and bim_lds. The pr_bim_hi and pr_bim_lo implementation 
were tested on all families to compare them to the free-arcs pseudoflow variants. 

• Implementations abmp and dinic. 



6.5 Results 



• Hi-lo: The run-times and operation counts for hilo instances are presented in Figure 10 and Table [3] 
respectively. 

The hilo family was designed to be much harder for the highest label push-relabel algorithm than the 
lowest label variant. As expected, pseudo_hi_free and bim_hi_free are the slowest, though the former is 
faster than the latter. The pseudo_lo_free variant is the fastest of all algorithms, and is more than twice 
as fast as bim_lo_free. 

Interestingly, the pseudo_hi_wave is faster than pseudo_lo_lifo, showing once again that pseudoflow and 
push-relabel have very different behavior. 

The pseudo_hi_wave and bim_bfs algorithms show the best scaling behavior and are likely to be faster 
than pseudo_lo_free on larger instances. The matching-pseudoflow algorithm shows poor scaling behav- 
ior; it is faster than dinic on smaller instances but becomes slower on large instances. 

The pseudo_hi_wave algorithm performs fewer arc scans and pushes (the dominant operations) than 
pseudo_lo_free, yet is slower. This suggests that the simplicity of the free-arcs implementations result 
in performance gains due to simplicity of code (which often leads to better compiler optimization). 

• Fewg: The run-times and operation counts for fewg instances are presented in Figure [Tl] and Table 
|4] respectively. 

The pseudo_hi_free and pseudo_lo_free algorithms are the fastest, and are more than twice as fast as the 
next-best algorithms (pr_bim_hi and pr_bim_lo). The difference seems to be in the number of arc scans 
performed. 

The matching-pseudoflow and dinic algorithms are the slowest, though matching-pseudoflow is faster on 
all instance sizes. 

• Manyg: The run-times and operation counts for manyg instances are presented in Figure [12] and 
Table [5] respectively. 
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The results are similar to the fewg instances. The pseudo_hi_free and pseudo_lo_free algorithms are the 
fastest, and are more than twice as fast as the next-best implementations (pr_bim_hi and pr_bim_lo), 
which is reflected in the number of arc scans performed. 

While the matching-pseudoflow implementation is faster than dinic on all instance sizes, dinic appears 
to scale better and is likely to be faster on larger instances. 

• Grid: The run-times and operation counts for grid instances are presented in Figure [T3| and Table [6] 
respectively. 

The scaling behavior of all the pseudoflow variants is extremely non-robust, making a comparison of 
the algorithms difficult. However, the pseudo_hLfree variant is the fastest on all instance sizes with 
the pseudo_lo_free variant close behind. These variants are more than twice as fast as the next-best 
algorithms (pr_bim_hi and pr_bim_lo). 

The matching-pseudoflow algorithm is faster than dinic, although its scaling behavior is not robust. 

The pseudo_hi_free, pseudo_lo_free, pr_bim_hi and pr_bim_lo algorithms did not perform any global re- 
labeling. Hence, this would be a good family to understand the fundamental differences between the 
four implementations. We see that the push-relabel variants perform a greater number of each of the 
operations; however, it is difficult to draw strong conclusions due to the non-robust scaling behavior 
of the pseudoflow variants. 

• Hexa: The run-times and operation counts for hexa instances are presented in Figure [14] and Table 
[7] respectively. 

The pseudo_hi_free and pseudo_lo_free algorithms are the fastest, followed by pr_bim_lo which is 1.5-1.8 
times slower, which is reflected in the number of arc scans performed. 

The matching-pseudoflow algorithm is faster than dinic by a similar factor. 



• Rope: The run-times and operation counts for rope instances are presented in Figure 15 and Table 
|8] respectively. 

This was the only family where abmp showed good performance, and is the fastest of all algorithms. 
The matching-pseudoflow is only marginally slower. Both matching-pseudoflow and pseudo_lo_free scale 
better than abmp and are likely to be faster on larger instances. The matching-pseudoflow algorithm is 
much faster than dinic, while the pseudo_hi_free algorithm is an order of magnitude faster than pr_bim_hi. 

The operation counts do not provide much insight. 



Zipf: The run-times and operation counts for zipf instances are presented in Figure [16] and Table [9] 
respectively. 

The pseudo_hi_free algorithm is the fastest, with matching-pseudoflow close behind. The next best 
algorithm is pseudo_lo_free (note that this is the only family in which pseudoJoJrec) is not the best or 
nearly best algorithm. 

The difference between the highest and lowest label variants seems to be due to the fact that no global 
relabels are triggered in the highest label variant, while the lowest label variants perform one relabel. 



7 Discussion 

We developed several variants of the pseudoflow algorithm for bipartite matching. One variant, the matching- 
pseudoflow algorithm was shown to have the best-known theoretical complexity for the problem. While the 
matching-pseudoflow could be viewed as a specialized implementation of Dinic's algorithm, we believe that the 
matching-pseudoflow is a natural extension of the generic pseudoflow algorithm, whereas Dinic's algorithm 
requires a greater degree of adaptation from its widely-accepted form. We also compared the matching- 
pseudoflow to Dinic's algorithm to point out the key differences between the two algorithms. 

We also developed several implementations of our algorithms and compared them to the fastest available 
codes based on the push-relabel algorithm. We draw the following conclusions from our experiments. 
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• Our best implementation was faster than that of Cherkassky et ah [T7 on each problem family tested. 
The psuedo_lo_free algorithm was the fastest or nearly fastest algorithm is six of the seven instance 
classes tested. On the remaining family (zipf), it was the third-fastest implementation and was within 
a factor of 2 of the fastest implementation. We hence declare this to be the best pseudoflow variant 
overall and recommend that it be the algorithm of choice when solving bipartite matching problems. 

• The pseudo_lo_free variant was generally faster than the pseudo_hi_free variant. This is consistent with 
the behavior of push-relabel where the lowest label variant was found to be faster than the highest 
label variant. However, in the regular pseudoflow variant (without free arcs), the highest label variant 
was generally faster than the lowest label variant. 

• While the psuedo_lo_free and pseudo_hi_free could be viewed as special implementations of the push- 
relabel algorithm with a two-edge push, they are uniformly faster than the push-relabel implementa- 
tions of Cherkassky et al. [12] . 

This difference is not due only to the different global relabeling frequency. In the grid instances where 
no global relabeling was performed, push-relabel variants performed more operations such as arc scans 
and pushes than the pseudoflow variants. 

• Although implementations based on the regular pseudoflow algorithm (i.e., without free arcs) were 
faster than push-relabel for unit capacity networks [lOj , their performance is unimpressive for bipartite 
matching. This is surprising given that bipartite matching is a special case of unit capacity networks. 
In general, pseudo_hi_wave and pseudo_lo_lifo were at least a factor of 2 slower than the fastest algorithm. 
However, their performance was comparable to that of pr_bim_hi and pr_bim_lo on four of the families. 

• The matching-pseudoflow algorithm is generally faster than dinic, and is nearly the fastest algorithm on 
two instance families. This is particularly interesting because the matching-pseudoflow algorithm could 
be viewed as an efficient implementation of Dinic's algorithm. Past experimental studies [Ml IH] 
have dismissed Dinic's algorithm as not being competitive in practice. However, the results here show 
that a careful implementation of Dinic's algorithm (i.e., the matching-pseudoflow) can be very efficient 
in practice. 

• The experiments comparing matching-pseudoflow and dinic on random graphs clearly shows that the 
theoretically efficient global relabeling procedure is efficient in practice as well. 

On benchmark instances, matching-pseudofow often performed a much greater number of global relabels. 
This is because matching-pseudoflow generates the layered network only until the lowest labeled layer of 
excess nodes and finds a blocking flow in this network, while dinic creates a layered network consisting 
of all excess nodes in the network and finds a blocking flow in this network. 
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Figure 10: Actual and relative run times for hilo instances. 
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Table 3: Operation counts for hilo instances. 
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Figure 11: Actual and relative run times for fewg instances. 
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Table 4: Operation counts for fewg instances. 
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0.1448 


0.1402 


0.497 


32,768 


(1.956) 


(1.765) 


(1.033) 


(1.000) 


(3.545) 


82,077 


abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_ld5 




0.3914 


0.6356 


0.2948 


0.2386 


0.354 




(2.792) 


(4.534) 


(2.103) 


(1.702) 


(2.525) 




pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.5922 


0.5512 


0.2934 


0.3036 


1.2324 


65,536 


(2.018) 


(1.879) 


(1.000) 


(1.035) 


(4.200) 


163,719 


abmp 
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pr_bim_hi 


pr_bimJo 


bim_lds 




0.794 


1.4626 


0.6418 


0.566 


0.801 




(2.706) 


(4.985) 


(2.187) 


(1.929) 


(2.730) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




1.3346 


1.3086 


0.713 


0.6648 


4.1976 


131,072 


(2.008) 


(1.968) 


(1.073) 


(1.000) 


(6.314) 


327,587 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


bim_lds 




2.105 


4.9986 


1.6166 


1.4522 


1.884 




(3.166) 


(7.519) 


(2.432) 


(2.184) 


(2.834) 




pseudo_hi_wave 


pseudo_lo_iifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




2.62 


2.53 


1.4326 


1.3702 


9.7666 


262,144 


(1.912) 


(1.846) 


(1.046) 


(1.000) 


(7.128) 


654,959 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim.lds 




4.6992 


9.955 


2.9092 


2.8892 


3.6358 




(3.430) 


(7.265) 


(2.123) 


(2.109) 


(2.653) 
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pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




5.9466 


5.9696 


2.7418 


2.9456 


26.5288 


524,288 


(2.169) 


(2.177) 


(1.000) 


(1.074) 


(9.676) 


1,310,160 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim.lds 




10.8922 


27.7888 


6.0768 


6.1736 


7.6666 




(3.973) 


(10.135) 


(2.216) 


(2.252) 


(2.796) 



Figure 12: Actual and relative run times for manyg instances. 
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pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


192,549 


184,781 


248,068 


232,279 


- 




Pushes 


132,243 


152,523 


106,419 


99,685 


- 




Relabels 


91,643 


88,978 


44,199 


41,694 


- 


32,768 
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- 


- 


1 


1 


21 


82,077 


Mergers 


35,779 


33,840 


61,348 


57,981 


- 




Depth 


3.7 


4.5 


1.7 


1.7 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_lds 




Arc scans 


652,624 


- 


692,732 


550,212 


839,879 




Pushes 


- 


- 


131,103 


104,488 


45,333 




Relabels 


- 


- 


46,067 


36,361 


- 
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4 


9 


1 


1 


- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


327,169 


331,021 


404,774 


399,765 


- 




Pushes 


284,422 


306,709 


177,838 


173,486 


- 




Relabels 


147,392 


152,630 


71,146 


70,413 


- 


65,536 


Updates 


- 


- 
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163,719 


Mergers 


69,667 


67,297 


105,191 


103,015 


- 




Depth 


4.1 


4.6 


1.7 


1.7 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_lds 




Arc scans 


1,224,438 


- 


1,002,185 


928,942 


1,371,693 




Pushes 


- 


- 


202,282 


186,948 


85,372 




R,elabels 


- 


- 


69,369 


64,205 


- 




Updates 


4 


9 


1 





- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


931,930 


920,992 


1,130,186 


1,112,806 


- 




Pushes 


671,855 


779,392 


482,875 


473,092 


- 




Relabels 


481,255 


481,382 


205,411 


203,835 


- 


131,072 


Updates 


- 


- 


1 


1 


29 


327,587 


Mergers 


143,409 


136,491 


273,993 


269,102 


- 




Depth 


4.7 


5.7 


1.8 


1.8 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.lds 




Arc scans 


3,148,755 


- 


3,840,908 


3,296,800 


4,337,322 




Pushes 


- 


- 


694,463 


604,050 


200,422 




Relabels 


- 


- 


249,093 


216,005 


- 




Updates 


5 


14 


2 


2 


- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


1,670,193 


1,667,586 


2,095,030 


2,100,661 


- 




Pushes 


1,449,080 


1,594,757 


903,041 


896,141 


- 




R,elabels 


833,720 


843,721 


380,154 


382,628 


- 


262,144 


Updates 


- 


- 


1 


1 


31 


654,959 


Mergers 


282,170 


270,932 


516,616 


513,166 


- 




Depth 


5.1 


5.9 


1.7 


1.7 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.lds 




Arc scans 


6,435,332 


- 


6,718,883 


6,584,312 


7,801,467 




Pushes 


- 


- 


1,238,835 


1,205,123 


388,275 




Relabels 


- 


- 


440,603 


430,411 


- 




Updates 


5 


15 


2 


2 


- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_[o_free 


matching-pseudoflow 




Arc scans 


2,905,666 


2,920,705 


3,979,979 


3,900,105 






Pushes 


2,815,922 


3,351,759 


1,721,001 


1,668,690 






Relabels 


1,380,457 


1,412,220 


717,924 


703,407 




524,288 


Updates 






1 


1 


29 


1,310,160 


Mergers 


562,195 


540,975 


990,694 


964,539 






Depth 


5.0 


6.2 


1.7 


1.7 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.lds 




Arc scans 


11,189,842 




10,377,914 


10,398,863 


13,327,851 




Pushes 






2,001,388 


1,993,899 


738,286 




Relabels 






699,959 


702,447 






Updates 


4 


13 


1 


1 





Table 5: Operation counts for manyg instances. 
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pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.412 


0.286 


0.087 


0.091 


0.396 


32,768 


(4.736) 


(3.283) 


(1.000) 


(1.046) 


(4.552) 


98,304 


abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_ld5 




0.284 


0.856 


0.155 


0.163 


0.233 




(3.269) 


(9.839) 


(1.784) 


(1.878) 


(2.676) 




pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.514 


0.428 


0.137 


0.143 


0.941 


65,536 


(3.764) 


(3.132) 


(1.000) 


(1.044) 


(6.889) 


196,608 
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pr_bim_hi 


pr_bimJo 


bim_lds 




0.595 


1.670 


0.313 


0.322 


0.462 




(4.354) 


(12.227) 


(2.288) 


(2.359) 


(3.382) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




1.106 


1.142 


0.148 


0.154 


0.687 


131,072 


(7.464) 


(7.707) 


(1.000) 


(1.040) 


(4.636) 


393,216 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


bim_lds 




0.653 


2.227 


0.670 


0.693 


0.932 




(4.404) 


(15.030) 


(4.524) 


(4.673) 


(6.291) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




4.702 


5.093 


0.655 


0.670 


5.315 


262,144 


(7.184) 


(7.780) 


(1.000) 


(1.023) 


(8.120) 


786,432 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim.lds 




3.319 


10.269 


1.369 


1.392 


2.057 




(5.070) 


(15.687) 


(2.091) 


(2.126) 


(3.143) 




pseudo.hLwave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




4.365 


4.612 


1.014 


1.133 


15.789 


524,288 


(4.304) 


(4.548) 


(1.000) 


(1.117) 


(15.568) 


1,572,864 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim.lds 




7.158 


29.642 


3.535 


3.787 


4.855 




(7.058) 


(29.227) 


(3.486) 


(3.734) 


(4.787) 



Figure 13: Actual and relative run times for grid instances. 
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132,434 


131,001 


154,177 


153,498 


- 
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221,811 


172,938 


58,471 


57,805 


- 




Relabels 


38,819 


38,902 


20,801 


20,689 


- 


32,768 


Updates 


- 


- 








10 


98,304 


Mergers 


32,806 


32,075 


37,427 


37,094 


- 




Depth 


6.8 


5.4 


1.6 


1.6 


- 






abmp 
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pr_bim_hi 


pr_bim_lo 


bim_lds 




Arc scans 


334,958 


- 


284,015 


279,646 


463,747 




Pushes 


- 


- 


56,947 


56,028 


30,079 




Relabels 


- 


- 


17,714 


17,358 


- 




Updates 


1 


6 








- 






pseudo_hi_wave 


p5eudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


250,158 


249,170 


219,660 


216,645 


- 




Pushes 


347,171 


293,260 


87,502 


86,312 


- 




Relabels 


70,092 


70,482 


27,238 


26,745 


- 


65,536 


Updates 


- 


- 








11 


196,608 


Mergers 


64,404 


63,225 


60,135 


59,540 


- 




Depth 


5.4 


4.6 


1.5 


1.4 


- 
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pr_bim_hi 


pr_bim_lo 


bim_lds 




Arc scans 


714,586 


- 


513,351 


509,895 


871,303 




Pushes 


- 


- 


101,531 


100,682 


57,070 




R,elabels 


- 


- 


31,510 


31,205 


- 




Updates 


1 


4 








- 






p5eudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


500,303 


492,658 


182,216 


182,216 


- 




Pushes 


807,174 


773,513 


89,741 


89,741 


- 




Relabels 


154,580 


151,290 


12,103 


12,103 


- 


131,072 


Updates 


- 


- 








10 


393,216 


Mergers 


124,654 


121,033 


77,639 


77,639 


- 




Depth 


6.5 


6.4 


1.2 


1.2 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.lds 




Arc scans 


591,966 


- 


1,105,938 


1,083,652 


1,672,692 




Pushes 


- 


- 


216,402 


211,837 


107,777 




Relabels 


- 


- 


65,785 


63,924 


- 




Updates 





4 








- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


1,006,773 


999,686 


980,473 


936,799 


- 




Pushes 


2,263,080 


1,543,859 


390,886 


377,017 


- 




R,elabels 


282,641 


282,317 


129,561 


122,832 


- 


262,144 


Updates 


- 


- 








15 


786,432 


Mergers 


257,599 


253,230 


260,979 


254,045 


- 




Depth 


8.8 


6.1 


1.5 


1.5 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.lds 




Arc scans 


3,724,682 


- 


2,023,942 


2,012,307 


3,375,505 




Pushes 


- 


- 


399,144 


396,445 


221,951 




Relabels 


- 


- 


124,220 


123,233 


- 




Updates 


2 


6 








- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_[o_free 


matching-pseudoflow 




Arc scans 


2,183,335 


2,162,769 


1,267,945 


1,266,556 






Pushes 


1,769,132 


1,827,967 


513,985 


512,656 






Relabels 


747,854 


751,059 


123,635 


123,640 




524,288 


Updates 
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1,572,864 


Mergers 


528,271 


503,973 


388,065 


387,400 






Depth 


3.3 


3.6 


1.3 


1.3 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.lds 




Arc scans 


7,139,976 




5,644,084 


5,604,995 


8,585,263 




Pushes 






1,175,331 


1,152,474 


509,287 




Relabels 






360,887 


359,436 






Updates 


2 


10 











Table 6: Operation counts for grid instances. 
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0.297 
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0.117 


0.373 
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(2.686) 


(2.544) 
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(3.199) 
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p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.589 


0.666 


0.242 


0.273 


0.904 


65,536 


(2.434) 


(2.750) 


(1.000) 


(1.127) 


(3.736) 


196,608 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


bim_ar 




0.549 


1.379 


0.490 


0.448 


0.596 




(2.269) 


(5.698) 


(2.024) 


(1.850) 


(2.463) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




1.210 


1.325 


0.543 


0.547 


1.955 


131,072 


(2.228) 


(2.441) 


(1.000) 


(1.007) 


(3.600) 


393,216 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


bim_ar 




1.275 


3.382 


1.042 


0.947 


1.228 




(2.348) 


(6.229) 


(1.919) 


(1.744) 


(2.261) 




pseudo_hi_wave 


pseudo_lo_iifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




3.932 


4.976 


1.629 


1.525 


4.321 


262,144 


(2.579) 


(3.264) 


(1.068) 


(1.000) 


(2.834) 


786,432 
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2.946 


8.602 


3.397 
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(1.677) 
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8.344 


9.318 


2.890 


2.820 


10.176 


524,288 


(2.959) 


(3.304) 


(1.025) 


(1.000) 


(3.608) 


1,572,864 
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pr.bimjo 
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6.429 
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(2.517) 


(1.816) 


(2.643) 



Figure 14: Actual and relative run times for hexa instances. 
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Arc scans 


248,522 


218,196 


225.952 


218,890 


- 




Pushes 


170,434 


245,992 


79.222 


77,716 


- 




Relabels 


92,615 


79,810 


30,985 


31,041 


- 


32,768 


Updates 


- 


- 


1 


1 


8 


98,304 


Mergers 


33,431 


31,771 


47,412 


46,659 


- 




Depth 


5.1 


7.7 


1.7 


1.7 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_ar 




Arc scans 


369,751 


- 


535,122 


470,222 


789,152 




Pushes 


- 


- 


92,042 


82,743 


18,931 




Relabels 


- 


- 


32,278 


28,995 


- 




Updates 


2 


4 


1 


1 


- 






pseudo_hi_wave 


p5eudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


458,751 


401,241 


439,941 


413,710 


- 




Pushes 


322,442 


490,279 


155,760 


147,014 


- 




Relabels 


165,117 


140,819 


60,672 


57,802 


- 


65,536 


Updates 


- 


- 


1 


1 


9 


196,608 


Mergers 


66,349 


63,366 


93,468 


89,095 


- 




Depth 


4.9 


7.7 


1.7 


1.7 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_ar 




Arc scans 


717,931 


- 


876,664 


907,438 


1,248,027 




Pushes 


- 


- 


153,849 


153,994 


37,924 




R,elabels 


- 


- 


52,928 


53,413 


- 




Updates 


2 


5 


1 


1 


- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


924,762 


836,356 


894,516 


846,603 


- 




Pushes 


654.422 


1,302,141 


319,055 


301,013 


- 




Relabels 


334,360 


297,826 


125,278 


119,074 


- 


131,072 


Updates 


- 


- 


1 


1 


10 


393,216 


Mergers 


131,790 


126,412 


190,691 


181,670 


- 




Depth 


5.0 


10.3 


1.7 


1.7 


- 






abmp 
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pr_bim_hi 


pr_bim_lo 


bim_ar 




Arc scans 


1,724,821 


- 


2,215,824 


1,837,612 


3,025,906 




Pushes 


- 


- 


374,792 


304,199 


75,885 




Relabels 


- 


- 


131,342 


105,363 


- 




Updates 


2 


5 


1 


1 


- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




Arc scans 


1,678,375 


1,464,907 


1,705,404 


1,726,744 


- 




Pushes 


1,424,488 


1,958,491 


609,616 


611,618 


- 




R,elabels 


581,244 


492,883 


236,370 


242,745 


- 


262,144 


Updates 


- 




1 





11 


786,432 


Mergers 


264,336 


254,686 


367,131 


368,132 


- 




Depth 


5.4 


7.7 


1.7 


1.7 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.ar 




Arc scans 


3,204,871 


- 


3,721,038 


3,403,393 


5,406,653 




Pushes 


- 


- 


655,858 


585,720 


152,163 




Relabels 


- 


- 


226,987 


201,954 


- 




Updates 


2 


5 


1 





- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_[o_free 


matching-pseudoflow 




Arc scans 


3,891,630 


3,719,454 


4,006,828 


3,931,486 






Pushes 


3,938,423 


5,722,941 


1,419,725 


1,385,077 






Relabels 


1,428,979 


1,365,452 


573,130 


566,263 




524,288 


Updates 






1 


1 


13 


1,572,864 


Mergers 


533,359 


514,992 


834,474 


817,150 






Depth 


7.4 


11.1 


1.7 


1.7 








abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim_ar 




Arc scans 


7,494,811 




11,608,019 


8,224,669 


13,204,537 




Pushes 






1,856,977 


1,305,988 


304,642 




Relabels 






662,983 


455,882 






Updates 


2 


6 


1 


1 





Table 7: Operation counts for hexa instances. 
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32768 


65536 


131072 
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524288 



matching-pseudoflow - 


— 1 — 


dinic - 


--X--- 


abmp - 




pseudo_hi_free 


— B 


pseudo_lo_free 


------- 


pr bim hi - 


■-0--- 


pr_bim_lo - 


-- 


pseudo_hi_wave ■ 


.. -A --- 


pseudojojifo 


i-- - 


pr bim fifo - 





Nodes 









pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.354 


0.457 


0.262 


0.176 


0.160 


32,768 


(2.743) 


(3.540) 


(2.026) 


(1.361) 


(1.237) 


98,304 


abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


pr_bim_fifo 




0.129 


0.482 


1.521 


0.256 


0.295 




(1.000) 


(3.729) 


(11.769) 


(1.981) 


(2.280) 




pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.801 


0.973 


0.555 


0.411 


0.346 


65,536 


(2.685) 


(3.261) 


(1.860) 


(1.376) 


(1.159) 


196,608 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


pr_bim_fifo 




0.298 


0.942 


3.797 


0.567 


0.703 




(1.000) 


(3.157) 


(12.725) 


(1.900) 


(2.357) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




1.753 


2.009 


1.201 


0.8.30 


0.765 


131,072 


(2.485) 


(2.847) 


(1.703) 


(1.177) 


(1.084) 


393,216 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


pr_bim_fifo 




0.706 


2.289 


11.307 


1.194 


1.567 




(1.000) 


(3.244) 


(16.024) 


(1.692) 


(2.221) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




3.667 


4.141 


2.515 


1.762 


1.557 


262,144 


(2.528) 


(2.855) 


(1.734) 


(1.215) 


(1.073) 


786,432 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


pr_bim_fifo 




1.450 


4.490 


32.118 


2.593 


3.329 




(1.000) 


(3.095) 


(22.144) 


(1.788) 


(2.295) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




8.152 


9.243 


6.154 


3.343 


3.210 


524,288 


(2.614) 


(2.964) 


(1.973) 


(1.072) 


(1.029) 


1,572,864 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


pr_bim_fifo 




3.119 


9.804 


52.828 


5.088 


6.979 




(1.000) 


(3.143) 


(16.938) 


(1.631) 


(2.238) 



Figure 15: Actual and relative run times for rope instances. 
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n, m 








pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


450,472 


640,668 


284,307 


185,589 


- 




Pushes 


109,632 


177,015 


125,871 


67,952 


- 




Relabels 


184,689 


266,637 


33,881 


16,449 


- 


32,768 


Updates 


- 


- 


4 


1 


12 


98,304 


Mergers 


50,793 


59,201 


71,126 


42,166 


- 




Depth 


2.2 


3.0 


1.8 


1.6 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


pr_bimJifo 




Arc scans 


234,915 


- 


4,807,442 


698,951 


706,610 




Pushes 


- 




948,187 


111,356 


121,627 




Relabels 


- 


- 


265,278 


34,320 


34,764 




Updates 


1 


5 


8 


1 


1 






pseudo_hi_wave 


p5eudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


928,042 


1,286,990 


627,108 


430,849 


- 




Pushes 


216,963 


355,423 


278,442 


157,025 


- 




Relabels 


383,272 


535,995 


78,260 


42,779 


- 


65,536 


Updates 


- 


- 


5 


2 


13 


196,608 


Mergers 


101,391 


118,642 


155,604 


94,896 


- 




Depth 


2.1 


3.0 


1.8 


1.7 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


pr_bim_fifo 




Arc scans 


447,591 


- 


10,954,608 


1,399,024 


1,401,884 




Pushes 


- 


- 


2,189,028 


223,019 


239,673 




R,elabels 


- 


- 


604,427 


68,655 


68,626 




Updates 


1 


6 


9 


1 


1 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


1,948,058 


2,601,575 


1,377,293 


788,584 


- 




Pushes 


445,041 


707,218 


658,013 


288,835 


- 




Relabels 


812,234 


1,085,864 


172,964 


71,711 


- 


131,072 


Updates 


- 


- 


6 


1 


15 


393,216 


Mergers 


203,135 


237,263 


361,772 


177,184 


- 




Depth 


2.2 


3.0 


1.8 


1.6 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


pr_bim_fifo 




Arc scans 


910,755 


- 


27,236,410 


2,800,225 


2,820,512 




Pushes 


- 


- 


5,320,568 


446,756 


484,653 




Relabels 


- 


- 


1,493,655 


137,355 


138,212 




Updates 


1 


6 


11 


1 


1 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


3,961,794 


5,202,682 


2,553,059 


1,498,842 


- 




Pushes 


878,458 


1,417,061 


1,250,749 


544,346 


- 




R,elabels 


1,657,356 


2,171,661 


322,220 


135,953 


- 


262,144 


Updates 


- 


- 


5 


1 


17 


786,432 


Mergers 


406,396 


474,495 


690,909 


337,708 






Depth 


2.2 


3.0 


1.8 


1.6 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


pr_bim_fifo 




Arc scans 


1,787,955 


- 


61,856,682 


5,608,445 


5,617,592 




Pushes 


- 


- 


12,354,203 


895,346 


962,001 




Relabels 


- 


- 


3,412,444 


275,013 


274,651 




Updates 


1 


6 


13 


1 


1 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_[o_free 


matching-pseudofiow 




Arc scans 


8,149,084 


10,433,210 


6,288,325 


3,043,754 






Pushes 


1,790,739 


2,880,818 


3,345,473 


1,118,770 






Relabels 


3,424,575 


4,354,052 


814,531 


273,814 




524,288 


Updates 






7 


1 


19 


1,572,864 


Mergers 


815,731 


951,915 


1,803,806 


690,455 






Depth 


2.2 


3.0 


1.9 


1.6 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


pr_bim_fifo 




Arc scans 


3,876,650 




166,032,545 


11,209,692 


11,242,609 




Pushes 






33,096,689 


1,788,801 


1,926,834 




Relabels 






9,134,295 


549,426 


549,353 




Updates 


1 


6 


17 


1 


1 



Table 8: Operation counts for rope instances. 
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zipf scaling 
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pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.079 


0.101 


0.046 


0.074 


0.049 


32,768 


(1.724) 


(2.211) 


(1.000) 


(1.618) 


(1.075) 


98,304 


abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_bf5 




0.087 


0.133 


0.088 


0.154 


0.059 




(1.912) 


(2.908) 


(1.930) 


(3.386) 


(1.294) 




pseudo_hi_wave 


pseudo_lo_lifo 


p5eudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.161 


0.217 


0.097 


0.170 


0.105 


65,536 


(1.664) 


(2.233) 


(1.000) 


(1.755) 


(1.078) 


196,608 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


bim_bfs 




0.186 


0.326 


0.194 


0.356 


0.144 




(1.913) 


(3.363) 


(2.004) 


(3.674) 


(1.487) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.328 


0.456 


0.193 


0.344 


0.213 


131,072 


(1.703) 


(2.369) 


(1.000) 


(1.785) 


(1.108) 


393,216 


abmp 


dinic 


pr_bim_hi 


pr_bimJo 


bim_bfs 




0.394 


0.696 


0.411 


0.768 


0.319 




(2.044) 


(3.616) 


(2.132) 


(3.985) 


(1.656) 




pseudo_hi_wave 


pseudo_lo_iifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




0.657 


0.954 


0.383 


0.691 


0.439 


262,144 


(1.713) 


(2.489) 


(1.000) 


(1.802) 


(1.146) 


786,432 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim.bfs 




0.823 


1.589 


0.840 


1.622 


0.668 




(2.148) 


(4.145) 


(2.190) 


(4.230) 


(1.742) 




pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudoflow 




1.550 


2.090 


0.762 


1.458 


0.885 


524,288 


(2.034) 


(2.742) 


(1.000) 


(1.913) 


(1.161) 


1,572,864 


abmp 


dinic 


pr_bim_hi 


pr.bimjo 


bim.bfs 




1.709 


3.247 


1.702 


3.316 


1.380 




(2.242) 


(4.261) 


(2.233) 


(4.351) 


(1.811) 



Figure 16: Actual and relative run times for zipf instances. 
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n, m 








pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


72,958 


72,086 


93,511 


72,960 


- 




Pushes 


21,603 


20,103 


31,891 


22,717 


- 




Relabels 


48,253 


48,278 


23,393 


17,549 


- 


32,768 


Updates 


- 


- 





1 


5 


98,304 


Mergers 


19,124 


17,866 


23,446 


18,859 


- 




Depth 


1.1 


1.1 


1.4 


1.2 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim_bfs 




Arc scans 


117,350 


- 


194,865 


286,754 


143,561 




Pushes 


- 


- 


55,897 


57,937 


12,250 




Relabels 


- 


- 


25,683 


32,771 


- 




Updates 


1 


2 





1 


- 






pseudo_hi_wave 


p5eudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


142,930 


141,199 


194,193 


142,776 


- 




Pushes 


41,179 


38,480 


63,280 


43,503 


- 




Relabels 


95,105 


94,902 


48,395 


34,562 


- 


65,536 


Updates 


- 


- 





1 


6 


196,608 


Mergers 


36,814 


34,501 


46,292 


36,404 


- 




Depth 


1.1 


1.1 


1.4 


1.2 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.bfs 




Arc scans 


246,974 


- 


380,736 


564,877 


282,227 




Pushes 


- 


- 


109,182 


113,575 


23,733 




R,elabels 


- 


- 


51,179 


65,539 


- 




Updates 


1 


2 





1 


- 






p5eudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


276,323 


280,119 


410,131 


278,417 


- 




Pushes 


78,569 


73,818 


122,384 


83,038 


- 




Relabels 


182,747 


196,278 


104,001 


67,823 


- 


131,072 


Updates 


- 


- 





1 


6 


393,216 


Mergers 


70,913 


66,712 


89,809 


70,135 


- 




Depth 


1.1 


1.1 


1.4 


1.2 


- 






abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.bfs 




Arc scans 


473,993 


- 


753,217 


1,117,726 


555,996 




Pushes 


- 


- 


216,199 


223,694 


45,932 




Relabels 


- 


- 


103,400 


131,075 


- 




Updates 


1 


2 





1 


- 






pseudo_hi_wave 


p5eudo_lo_lifo 


pseudo_hi_free 


pseudo_lo_free 


matching-pseudofiow 




Arc scans 


540,623 


542,210 


846,919 


542,842 


- 




Pushes 


149,664 


141,379 


234,925 


158,296 


- 




R,elabels 


360,906 


375,809 


215,777 


133,381 


- 


262,144 


Updates 


- 


- 





1 


6 


786,432 


Mergers 


136,426 


128,788 


173,329 


135,015 


- 




Depth 


1.1 


1.1 


1.4 


1.2 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.bfs 




Arc scans 


950,009 


- 


1,477,907 


2,199,752 


1,090,311 




Pushes 


- 


- 


426,648 


439,407 


89,149 




Relabels 


- 


- 


208,224 


262,147 


- 




Updates 


1 


2 





1 


- 






pseudo_hi_wave 


pseudo_lo_lifo 


pseudo_hi_free 


pseudo_[o_free 


matching-pseudofiow 




Arc scans 


1,066,914 


1,052,889 


1,610,434 


1,106,222 






Pushes 


286,868 


271,869 


436,616 


306,668 






Relabels 


726,440 


720,922 


417,641 


275,043 




524,288 


Updates 









1 


7 


1,572,864 


Mergers 


263,206 


249,177 


327,303 


262,329 






Depth 


1.1 


1.1 


1.3 


1.2 








abmp 


dinic 


pr_bim_hi 


pr_bim_lo 


bim.bfs 




Arc scans 


1,858,122 




2,888,953 


4,367,508 


2,164,681 




Pushes 






830,304 


873,298 


173,285 




Relabels 






412,878 


524,291 






Updates 


1 


2 





1 





Table 9: Operation counts for zipf instances. 
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A Proof of Lemma 



S3 



Proof: Our analysis of the number of stages is essentially the same as that of Dinic's algorithm as per Even 
and Tarjan [IB] and Hopcroft and Karp pT] , 

By construction, each I layered network guarantees at least one successful path, as some WT\ node is 
reachable through a sequence of mergers of length I. We divide the stages into two parts: the first part 
includes stages of labels no larger than and the second part consists of the stages with labels greater 
than Since the label of the lowest labeled strong node strictly increases in each stage, the number of 
stages in the first part is at most yfn. We show that the second part can also have at most i/k stages. 

In the second part of the algorithm, the successful paths of length L > ^J~k are equivalent to flow 
augmentations along a path of length 2L + 1. We now observe that each WT2 branch contains a residual arc 
of capacity 1 from root to child, and the set of residual arcs in the WT2 branches in any layer p > of the 
network forms a valid cut in the residual graph. This is since it separates the roots in this layer and nodes 
with label greater than p from the children in the layer and nodes with label less than p as in Figure |17| 

O flayer (5ri) 



(p+l)-kyer {WT2) 



p-layer {WT2) 



(p-l)-layer {WT2) 



O 0-layer (lyri) 

Figure 17: Arcs in the branches of a layer form a cut in the residual graph. 

Thus the maximum flow value in the residual graph at the beginning of part two is no larger than the 
smallest number of branches in one of the layers. Since the layered network consists of at most k branches 
and the number of layers is L, then the maximum flow in the residual graph can be no larger than which 
in the second part is no larger than y^. Thus the total number of augmentations in part two is at most 
•^K. Since each layered network guarantees at least once augmentation, there are at most -^/k stages in the 
second part of the algorithm. ■ 
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B Complexity using word operations 



We show here how to use boolean operations to improve the complexity of the matching-pseudoflow algorithm. 

The characteristic vector of out-ncighbors of each Vi-node v is maintained as a binary word OUT(w) of length 
712- OUT(i;) is a word where the i*'' bit is 1 if there is an arc from v G Vi to i G ¥2- We also maintain a 
characteristic vector of in-neighbors list as a word IN(w) of length m for each node in v G ¥2- IN(u) is a 
word where the i*'' bit is 1 if the arc from i E Vi to v G V2 exists. 

These words are maintained in addition to the adjacency list which is a linked list of in and out neighbors 
for each node in the graph. The words and the adjacency list are used in parallel to achieve the better time 
complexity of the approach using only words and that using only the adjacency list. When we say that 
the two are used in parallel we imply that the adjacency list and word operations are accessed and used 
alternately. 

Using A-bit word operations (A < ni), wc break OUT() and IN() into a concatenation of A-bit words, 
and perform operations on these words. Each of these A-bit words is called a X-word and the j^'^ A-word is 
denoted by 0\]T^{v). 

Three boolean operations are used: 

1. LEAD: Given a word W, lead(W^) returns the index of the leading non-zero bit in W, and if all bits 
are 0. 

2. AND: Given two words A and B of the same length, ^4 A B is a word whose i*^ bit is 1 iff the i*^ bits 
of A and B are 1, and otherwise. 

3. OR: Given two words A and B of the same length, AV B is a, word whose i*'' bit is 1 if the z*'' bit of 
A or B (or both) is 1, and otherwise. 

If we wish to perform any of the above operations on a word of k bits using word operations on words of 
A bits where A < fc, each fc-bit word operation can be done in 0{j) steps. 

Any boolean operation (A, V, lead) on a A-word counts as a single operation. Given two nodes i G V\ 
and j G V2, the bits corresponding to the arc in IN(j) and OUT(t) can be accessed and modified in 
0(1). 

Initialization For each node v G Vi, we look at the next arc in its out neighbors in the adjacency 
list. If this arc does not lead to an unmatched WTi node, we perform a lead(OUT^(t;)) operation. If 
lead(OUT^(?;)) equals 0, wc return to the adjacency list and look at the next arc. Again, if this arc docs not 
lead to an unmatched V2-node, a lead operation is performed on the next unscanned A-word (OUT^(ti)). 
This procedure of looking at the next A-word and the next arc in the adjacency list until an unmatched V2 
neighbor is found, or the end of the list is reached. In the adjacency list, either a neighboring WTi branch 
is found or all the neighbors are exhausted in at most k arc scans for each ^i-node. Thus, there are at most 
niK arc scans. Further, each arc is looked at most once, so the complexity is 0{m.in{niK,m}). 

In OUT(f), cither a neighboring WTi branch is found or all the neighbors are exhausted in 0(n2/A) 
operations. A neighboring WTi branch, if it exists, is thus found in 0(min{niK, m, ^^^^}). 

Once a merger is executed, the bits corresponding to the merger arc in the IN() and OUT() words must 
be changed. Since there arc at most k mergers during initialization, and each requires 0(1) work, the work 
done to maintain these words is 0{k). 

Claim B.l The work done in initialization using X-words is 0(min{ni«;, m, ^^^^}). 

Building the 1-layer Similar to the initialization, for each child w of a WT2 branch, we search for a 
merger arc by looking in parallel at the next neighbor in the adjacency list and performing a lead() operation 
on the next A-word OUT(ii). The search terminates either when a WTi neighbor is found or the end of 
the list is reached, which occurs in min{/t, n2/A} operations. Since there are at most kappa nodes that are 
children of a WT2 branch and each arc is scanned at most once in the entire algorithm, the total work to 
generate the 1-layer of the layered network throughout the algorithm is 0(min{K^, n2K/A, m}). 

Building the layered network We now describe the use of word operations in generating a layered 
network upwards from the 1-layer. With the exception of the £-layer, all the branches in the layered network 
are WT2 branches. We first discuss labeling the WT2 branches, and later discuss how to find the ^-layer. 
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We construct words for the sub-graph induced only by the nodes in the WT2 branches. That is, each 
node V ^ Vi which is the child of a WT2 branch has an associated word SUB-OUT(u) containing the subset 
of its out-neighbor nodes that are in WT2 branches, the length of which is at most k. Note that this is 
different from OUT(u) which is a word of length 712 and contains all out neighbors of v, not just those that 
are in WT2 branches. 

The i*^ bit of SUB-OUT(u) is 1 if an arc exists from v to the i*'' node which is a root of a WT2 branch. 
Similarly, the roots of the WT2 branches have a word SUB-IN(i') of size at most k representing the in- 
neighbors of v that are children in a WT2 branch. We will use SUB-IN() to build the layered network and 
SUB-OUT() while pushing flow through this network. 

Initially, SUB-IN() and SUB-OUT() are empty since there are no WT2 branches. As WT2 branches 
are created during the algorithm, SUB-IN() SUB-OUT() words are created for each of the nodes in these 
branches. At any point in the algorithm, SUB-OUT(w) is a subset of OUT(i;) and SUB-IN(ti) is a subset of 
IN(i;) that contains only those bits that correspond to nodes that are in WT2 branches. The relation between 



IN(), OUT(), SUB-IN() and SUB-OUT() are shown in Figure [18) The matrix formed by the SUB-IN() and 
SUB-OUT() words is referred to as the SUB-matrix. 
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Figure 18: IN(), OUT(), SUB-IN() and SUB-OUT() at some point during the algorithm when nodes z, j S Vi, 
and a, 6 G V2 are in WT2 branches. 

Each time a singleton node (either WTi or STi branch) becomes part of a WT2 branch either during 
initialization or later in the algorithm, we add a bit corresponding to that node to the existing SUB-IN() and 
SUB-OUT() words, and create a new word for that node. This is equivalent to adding a row and column to 
the SUB-matrix. The total work throughout the algorithm is 0{k^) since there are 0{k) words each of size 
0(k), and adding a bit to the words is an 0(1) operation. 

Two more types of words are needed to construct the layered network. 

1. A V2LAYER(fc) word (the characteristic vector of each layer) indicating the V2-nodes contained in each 
layer 1 < k < k. The length of the word is at most k and a bit of y2LAYER(A:) is 1 if a V2-node 
corresponding to that bit is in layer k. All the VJ^LAYERO words are set to at the beginning of each 
stage. 

2. A ViLAYER(fc) word (the characteristic vector of each layer) indicating the Vi -nodes contained in each 
layer 1 < k < n. The length of the word is at most k and a bit of ViLAYER(A;) is 1 if a Vi-node 
corresponding to that bit is in layer k. All the V^iLAYERO words are set to at the beginning of each 
stage. 

3. A REACHED word of size k that keeps track of Vi-nodes that have not been reached by the upward 
breadth-first-scarch in each stage to create the layered network. The i*'' bit is if that node has been 
assigned to a layer, and 1 otherwise. All the bits of this word are set to 1 at the beginning of each 
stage. 

Once we have computed the 1-layer of the network, ViLAYER(l) is populated with 1 in the locations of 
the nodes in the 1-layer. REACHED is then populated with in the locations of the Vi-nodes in the 1-layer. 
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step 1: Build yiLAYER(l) 
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Figure 19: Finding Vi-nodes in the 2-layer from V2LAYER(1) using word operations. 



The V2LAYER(1) is now constructed by successively loolcing at each A-word in ViLAYER(l) and per- 
forming a lead() operation on that word. If the result of the lead() operation is non-zero, then we know 
the index of a Vi-node in the 1-layer, and its unique parent's bit is changed in the V2LAYER(1). The 
1-bit corresponding to the output of the lead() operation is now set to 0, and another lead() operation is 
performed on the same word. Identifying a 1-bit and changing it continues until the result of the lead() 
operation is zero, in which case we move to the next A-word and perform a lead() operation on that word 
to find a 1-bit. 



Now, given V2-nodes {wi, . . . ,i 
yiLAYER(2) = (SUB 



,} in the 1-layer, UiLAYER(2) is 

- IN(i;i) V SUB - IN(w2) V ... V SUB - IN(wj)) A REACHED. 



Figure [T9| illustrates this procedure on an example. 

The V2-nodes in the 2-layer are obtained from ViLAYER(2) (the parent of a Ui-node in the 2-layer is a 
V2-node in the 2-layer). The REACHED word is updated, the V2LAYER(2) word is constructed a bit at a 
time using the V2-nodes in the 2-layer. As above, ViLAYER(3) is now constructed from the SUB-IN() words 
of V2-nodes in the 2-layer and REACHED. This continues until there are no more changes in REACHED. 

The complexity of constructing the U2LAYER() from the ViLAYER() takes 0{k^/X) throughout the 
stage. The number of word operations that result in finding a 1-bit in the UiLAYER, and changing the 
corresponding bit in the V2LAYER is at most k, since there are at most k WT2 branches. The number of 
word operations that result in not finding a 1-bit is 0{k/X) for each layer since each ViLAYER() is of length 
K and we look at the next A-word when we do not find a 1-bit. There are at most k layers, so the work done 
in finding the U2LAYER() words given the UiLAYERO words is 0{k^/X) per stage. 

An operation is performed on each A-word of SUB-IN() at most once for each node in a stage, and SUB- 
IN() is of length at most k; so the work to generate the layered network is 0{k^/\). The work to update 
the REACHED word is 0(k) per stage. 

At this point, we have the two-edge distances of all WTi branches. To find the set of STi immediately 
reachable from this set, we use IN() (not SUB-IN() since we want to reach nodes outside the set of TUT2 
branches) and the incoming arcs in the adjacency list, in parallel, for each node to check if a STi branch is 
reachable from this node. Finding the £-layer is done analogously to finding the 1-layer. For each node v 
that is the root of a WT2 branch, an incoming arc in the adjacency list is scanned for a STi neighbor. If no 
merger is found, a lead() operation is performed on A bits of the IN^(u) to check for an STi neighbor. If no 
merger is found, the next arc in the adjacency list is looked at. This procedure of looking at the next arc 
in the incoming arcs in the adjacency list and performing a lead() on the next A-word of IN(i;) in parallel 
continues until a STi neighbor is found, or all the neighbors are exhausted. Since IN(w) is a word of length 
rti, the end of this word is reached in 0(7ii/A) operations. The end of the adjacency list is reached in at most 
K arc scans of the adjacency list. Further, each arc is looked at most once so the total work done throughout 
the algorithm in checking for STi neighbors is 0(min{Krti/A, k^, to}). 
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We also maintain a word VISITED (of length at most k) that keeps track of the branches that have been 
visited at each stage, i.e., the i*^ bit of this word is 1 if the root of the branch has not been visited in that 
stage. All bits in this word are initially set to 1. 

To push flow through the network, we use y2LAYER(), VISITED, and SUB-OUT() to identify a merger 
arc. For a node v € Vi of label p, the set of arcs from node v to an unvisited node of layer p — 1 is found 
by V2LAYER(p- 1) A SUB-OUT(w) A VISITED. A lead() operation on this resultant word gives a merger 
arc if it exists. 

Each time a merger is found, one more branch becomes visited. Therefore, there can be at most k mergers 
in each stage. Hence, there are at most k word operations that lead to mergers, which takes 0{k) work. 
Each time a node is revisited in a stage, the search for mergers starts from the last A-word checked for a 
merger; so the work done in word operations that do not find a mergers is 0{k/X) per node per stage, which 
in 0{k^ /X) total work per stage. Updating VISITED requires 0{k) work throughout the stage. Hence, work 
to push flow by executing mergers is 0{k^ /X) per stage. 

Since each stage can have at most k successful mergers and there are 0{^/k) stages, the number of 
successful mergers is 0(^3/2). The IN(), OUT(), SUB-IN() and SUB-OUT() words need to be updated each 
time a successful merger occurs. Each update takes 0(1), so the total work updating these words is 0{k^^^). 

Table [T0| summarizes our complexity results for our algorithms with word operations. 



Operation 


Per stage 


Total 


Initialization 

Constructing 1-layer 

Constructing ^-layer 

Layered network - layers 2, . . . ,£ — 1 

Executing mergers 

Creating SUB-IN and SUB-OUT 

Updating SUB-IN, SUB-OUT, IN, and OUT 


o{Kyx) 
o{Kyx) 

0{k) 


0(min{niK, ^,m}) 
0(min{K2,^ifi,m}) 

0{K^yX) 

o{K^-yx) 

0(«2) 

0(^3/2) 


TOTAL 


0(min{niK, m} + + n^-^X) 



Table 10: Complexity summary of algorithm for bipartite matching with word operations. 



C An alternative approach 

We now show that it is possible to achieve the theoretical complexity of the matching-pseudoflow algorithm 
by a clever analysis of Hopcroft and Karp's matching algorithm 2l|^ Given graph G = (Vi U V2,E), let 
the cardinality of the greedy matching be Kg. Denote the nodes in the maximal matching by C T/, then 

\Vg\^2Kg. 

Lemma C.l k < 2Kg. 

Proof: Every edge in the graph has at least one end point in Vg (otherwise, an edge with neither end point 
in Vg can be added to the matching, which contradicts maximality). Therefore, every edge in an optimal 
matching must also have at least one end point in Vg. Thus, the cardinality of the maximum matching is 
bounded by the cardinality of the set Vg, which is 2Kg. ■ 

For each v ^ Vg, let Eg{v) denote the set of edges that have one end point in v and the other end point 
in another node in Vg. We now construct a graph G* = (Vi U V2, i?*), where E* C E contains the following 
edges: 

(i) For every node v E Vg with degree < 2Kg in G, E* contains all edges adjacent to v. 

^We thank an anonymous referee for this analysis 
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(ii) For every node v G Vg with degree > 2Kg in G, E* contains all edges in Eg{v) and an arbitrary subset 
of 2Kg — \Eg{v)\ edges adjacent to v that are not in Eg{v). That is, a subset of 2Kg edges adjacent to 
V that contain all the edges in Eg{v). 

Each node v € Vg in G* has at most 2Kg neighbors by construction. Since every edge is adjacent to some node 
in Vg and | V"g| — 2Kg, the total number of edges in E* is at most 4k^. Since Kg < k, E* has 0(min{m, k^}) 
edges. 

Theorem C.l A maximum matching in G* has cardinality k. 




Min-cut 



Figure 20: Minimum cut in G*^. 



Proof: Let the cardinality of a maximum matching in G* be denoted by k* . We obtain the maximum 
matching by solving for a minimum cut in a graph G*^ obtained by adding a source node s, a sink node t, 
and unit capacity arcs from s to all nodes in Vi and from all nodes in V2 to t. Let the source set of the 
minimum cut in be S* = {s} U SiU S2 and the sink set be T* = {t} U Ti U T2 where U Ti = Vi and 
S'2 U T2 = V2 as shown in Figure 20 Then, the capacity of this minimum cut is k*, i.e., |Ti| + |S'2| = k* . 

The maximum matching in G is similarly obtained by solving for a minimum cut in a graph Gst (obtained 
by adding a source and a sink node, and arcs adjacent to the source and sink); this minimum cut has capacity 

K < Hi < 712- 

Suppose (for contradiction) that k* < k. Then, Si, S2, 7i, and T2 are non-empty (if any of these sets 
were empty, then k* = ni which contradicts the assumption that k* < k < ni). The minimum s,i-cut 
(S*,T*) in G*t cannot be a finite cut in Gst since it has a capacity strictly less than the minimum cut in 
Gst- Then, there exists some arc {i,j) in Gst but not in G*j such that i € Si and j G T2. Since the arc («, j) 
was removed from G to generate G*, it means that node i has exactly 2Kg neighbors in G*. Further, since 
i is in the source set of a finite cut in G*^, all the neighbors of i must belong to S'2. That is, |S'2| > 2Kg. 



We have shown that k* > 
assumption that k* < k. 



52I since Ti is non-empty. Therefore, k* > 15*21 > 



2Kg > K, contradicting the 



The above observations and theorem suggest the following algorithm: 

1. Generate a maximal matching (takes 0(min{m, tiik}) work). 

2. Construct graph G* as described above (takes 0(min{m, k^}) work). 

3. Solve for a maximum matching using the Hopcroft-Karp algorithm. Since the number of edges in G* 
is 0(min{m, k^}), and the number of nodes is 0{k), the complexity is 0(-yKrnin{m, k^}). 
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D Test instances 



The descriptions of these instances is reproduced from Cherkassky et al. [12j . 

1. Fewg and manyg: These are random bipartite graphs where the vertices of each partition, Vi and V2 
, are divided into k groups of equal size. For each vertex of the j-th group of Vi the generator chooses 
y random neighbors from the {i — l)-th through (i + l)-th groups of V2 (with wrap-around), where y is 
binomially distributed with mean d (thus d — mean vertex degree). The indices i and j are not related 
because vertices in Vi are randomly shuffled before neighbors in V2 are assigned. The two famihes we 
consider are fewg, where there are 32 groups, and manyg, where there are 256 groups; both have d = 
5. 

These classes were designed having in mind problems that can be reduced to bipartite matching, such 
as the maximum vertex-disjoint paths problem. In these problems the resulting graph in the reduction 
is bipartite, but if the original graph is planar or nearly planar each vertex will only have as neighbors 
vertices in the surrounding area. 

2. Hilo: The hi-lo family of bipartite matching problems was designed to separate high and low vertex 
selection strategies for the push-relabel method. This generator creates a graph with a unique perfect 
matching and has been motivated by a generator of Kennedy [52] . 

Let G = (Vi; V2, E) be a graph produced by this generator. This graph is defined by three parameters, 
i, k, and d. Vertices of Vi are partitioned into £ groups, each containing k vertices. For 1 < i < fc, 
1 ^ J ^ -^j we refer to the i-th vertex in group j by . Vertices of V2 are partitioned similarly, and yf 
is defined similarly to x^. Each vertex x^ is connected to vertices for max(l,i — d) < p < i and, if 
j < £, to vertices y^'^^ for max(l, i — d) < p < i. 

3. Grid: In class grid, each vertex m S Vi is connected to vertices {m -1-1, it— 1, u+a, u—a, u+b, u—b,...} 
where {1, a, b, . . .} is a geometric progression. In our tests, we set the average degree of each node to 
6. 

4. Hexa: In class hexa, the vertices on each side are divided into n/b blocks of size b. One random 
bipartite hexagon is added between each block i on one side and each of the blocks z -I- fc on the other 
side, with |fc| < K for some K. The parameters b and K are chosen by the program in such a way that 
the average degree is correct (i.e., 3K/b — d) but few pairs of hexagons have more than one vertex in 
common. In our tests, we set d = 6. 

5. Rope: For the class rope, the vertices on each side are grouped into t = n/d blocks of size d, numbered 

... and V^... V^'\ Block i on one side is connected to block j -I- 1 on the other side, for 

j = 0, 1, . . . , t ~ 2; block Vi~^ is connected to block ¥2"^. Thus, the graph is a "rope" that is folded 
and twisted over itself, so that it zig-zags between the two sides, first up and then down. Consecutive 
pairs of blocks along the "rope" are connected alternately by perfect matchings ( "m-type arcs" ) and 
random bipartite graphs of average degree c? — 1 ( "r-type arcs" ) , beginning and ending with perfect 
matchings. The only maximum matching is a perfect one, consisting of all m-type arcs. In our tests, 
set d — 6. 

6. Zipf: Each member of class zipf is a random bipartite graph where the arc between the «-th Vi-node 
and the j-th V2-iiode has nominal probability roughly proportional to Thus the graph is denser 
near the "core" vertices (those with small index), and thins out slowly towards the "periphery" (vertices 
with high index). In our experiments we set d = 6. 
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